Mixture input transformations for adaptation of hybrid connectionist speech recognizers
نویسنده
چکیده
We extend the input transformation approach for adapting hybrid connectionist speech recognizers to allow multiple transformations to be trained. Previous work has shown the efficacy of the linear input transformation approach for speaker adaptation [1][2][3], but has focused only on training global transformations. This approach is clearly suboptimal since it assumes that a single transformation is appropriate for every region in the acoustic feature input space, that is, for every phonetic class, microphone, and noise level. In this paper, we propose a new algorithm to train mixtures of transformation networks (MTNs) in the hybrid connectionist recognition framework. This approach is based on the idea of partitioning the acoustic feature space into R regions and training an input transformation for each region. The transformations are combined probabilistically according to the degree to which the acoustic features belong to each region, where the combination weights are derived from a separate acoustic gating network (AGN). We apply the new algorithm to nonnative speaker adaptation, and present recognition results for the 1994 WSJ Spoke 3 development set. The MTN technique can also be used for noise or microphone robust recognition or for other nonspeech neural network pattern recognition problems.
منابع مشابه
Maximum-likelihood stochastic-transformation adaptation of hidden Markov models
The recognition accuracy in recent large vocabulary automatic speech recognition (ASR) systems is highly related to the existing mismatch between the training and testing sets. For example, dialect differences across the training and testing speakers result to a significant degradation in recognition performance. Some popular adaptation approaches improve the recognition performance of speech r...
متن کاملContext dependent modelling approaches for hybrid speech recognizers
Speech recognition based on connectionist approaches is one of the most successful alternatives to widespread Gaussian systems. One of the main claims against hybrid recognizers is the increased complexity for context-dependent phone modeling, which is a key aspect in medium to large size vocabulary tasks. In this paper, we investigate the use of context-dependent triphone models in a connectio...
متن کاملAdaptation of hidden Markov models using multiple stochastic transformations
The recognition accuracy in recent large vocabulary Automatic Speech Recognition (ASR) systems is highly related to the existing mismatch between the training and test sets. For example, dialect di erences across the training and testing speakers result to a signi cant degradation in recognition performance. Some popular adaptation approaches improve the recognition performance of speech recogn...
متن کاملLarge vocabulary speech recognition with context dependent MMI-connectionist / HMM systems using the WSJ database
In this paper we present a context dependent hybrid MMI-connectionist / Hidden Markov Model (HMM) speech recognition system for the Wall Street Journal (WSJ) database. The hybrid system is build with a neural network, which is used as a vector quantizer (VQ) and an HMM with discrete probablility density functions, which has the advantage of a faster decoding. The neural network is trained on an...
متن کاملSmoothed local adaptation of connectionist systems
abbot is the hybrid connectionist hidden Markov model (HMM) large vocabulary continuous speech recognition system developed at Cambridge University Engineering Department. abbot makes e ective use of the linear input network (LIN) adaptation technique to achieve speaker and channel adaptation. Although the LIN is e ective at adapting to new speakers or a new environment (e.g. a di erent microph...
متن کامل